A Practical Survey on Faster and Lighter Transformers

نویسندگان

چکیده

Recurrent neural networks are effective models to process sequences. However, they unable learn long-term dependencies because of their inherent sequential nature. As a solution, Vaswani et al. introduced the Transformer, model solely based on attention mechanism that is able relate any two positions input sequence, hence modelling arbitrary long dependencies. The Transformer has improved state-of-the-art across numerous sequence tasks. its effectiveness comes at expense quadratic computational and memory complexity with respect length, hindering adoption. Fortunately, deep learning community always been interested in improving models' efficiency, leading plethora solutions such as parameter sharing, pruning, mixed-precision, knowledge distillation. Recently, researchers have directly addressed Transformer's limitation by designing lower-complexity alternatives Longformer, Reformer, Linformer, Performer. due wide range solutions, it become challenging for practitioners determine which methods apply practice order meet desired trade-off between capacity, computation, memory. This survey addresses this issue investigating popular approaches make Transformers faster lighter providing comprehensive explanation methods' strengths, limitations, underlying assumptions.

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Image Semantic Transformation: Faster, Lighter and Stronger

We propose Image-Semantic-Transformation-ReconstructionCircle(ISTRC) model, a novel and powerful method using facenet’s Euclidean latent space to understand the images. As the name suggests, ISTRC construct the circle, able to perfectly reconstruct images. One powerful Euclidean latent space embedded in ISTRC is FaceNet’s last layer with the power of distinguishing and understanding images. Our...

متن کامل

Faster and Lighter Phrase-based Machine Translation Baseline

This paper describes the SENSE machine translation system participation in the Third Workshop for Asian Translation (WAT2016). We share our best practices to build a fast and light phrasebased machine translation (PBMT) models that have comparable results to the baseline systems provided by the organizers. As Neural Machine Translation (NMT) overtakes PBMT as the state-of-the-art, deep learning...

متن کامل

a head parameter survey on mazandarani dialect and its effect(s) on learning english from ca perspective (on the basis of x-bar syntax)1

there has been a gradual shift of focus from the study of rule systems, which have increasingly been regarded as impoverished, … to the study of systems of principles, which appear to occupy a much more central position in determining the character and variety of possible human languages. there is a set of absolute universals, notions and principles existing in ug which do not vary from one ...

15 صفحه اول

A Survey on Practical Numbers

A positive integer m is said to be practical if every integer n 2 (1; m) is a sum of distinct positive divisors of m: In this paper we give an equivalent deenition of practical number, and describe some arithmetical properties of practical numbers showing a remarkable analogy with primes. We give an improvement of the estimate of the gap between consecutive practical numbers and prove the exist...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: ACM Computing Surveys

سال: 2023

ISSN: ['0360-0300', '1557-7341']

DOI: https://doi.org/10.1145/3586074